Generating Scientific Documentation for Computational Experiments Using Provenance

نویسندگان

  • Adianto Wibisono
  • Peter Bloem
  • Gerben de Vries
  • Paul T. Groth
  • Adam Belloum
  • Marian Bubak
چکیده

Electronic notebooks are a common mechanism for scientists to document and investigate their work. With the advent of tools such as IPython Notebooks and Knitr, these notebooks allow code and data to be mixed together and published online. However, these approaches assume that all work is done in the same notebook environment. In this work, we look at generating notebook documentation from multi-environment workflows by using provenance represented in the W3C PROV model. Specifically, using PROV generated from the Ducktape workflow system, we are able to generate IPython notebooks that include results tables, provenance visualizations as well as references to the software and datasets used. The notebooks are interactive and editable, so that the user can explore and analyze the results of the experiment without re-running the workflow. We identify specific extensions to PROV necessary for facilitating documentation generation. To evaluate, we recreate the documentation website for a paper which won the Open Science Award at the ECML/PKDD 2013 machine learning conference. We show that the documentation produced automatically by our system provides more detail and greater experimental insight than the original hand-crafted documentation. Our approach bridges the gap between user friendly notebook documentation and provenance generated by distributed heterogeneous components.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Provenance to support Good Laboratory Practice in Grid Environments

Conducting experiments and documenting results is daily business of scientists. Good and traceable documentation enables other scientists to confirm procedures and results for increased credibility. Documentation and scientific conduct are regulated and termed as “good laboratory practice.” Laboratory notebooks are used to record each step in conducting an experiment and processing data. Origin...

متن کامل

Applying Provenance to Protect Attribution in Distributed Computational Scientific Experiments

The automation of large scale computational scientific experiments can be accomplished with the use of scientific workflow management systems, which allow for the definition of their activities and data dependencies. The manual analysis of the data resulting from their execution is burdensome, due to the usually large amounts of information. Provenance systems can be used to support this task s...

متن کامل

A Provenance-Based Infrastructure to Support the Life Cycle of Executable Papers

As publishers establish a greater online presence as well as infrastructure to support the distribution of more varied information, the idea of an executable paper that enables greater interaction has developed. An executable paper provides more information for computational experiments and results than the text, tables, and figures of standard papers. Executable papers can bundle computational...

متن کامل

Publishing Provenance-rich Scientific Papers

Complete documentation and reproducibility of results are important goals for scientific publications. Standard scientific papers, however, usually contain only final results and document only parameters and processing steps that the authors considered important enough. By recording the complete provenance history of the data leading to a publication one can overcome this limitation and allow r...

متن کامل

Formalising a protocol for recording provenance in Grids

Both the scientific and business communities are beginning to rely on Grids as problemsolving mechanisms. These communities also have requirements in terms of provenance. Provenance is the documentation of process and the necessity for it is apparent in fields ranging from medicine to aerospace. To support provenance capture in Grids, we have developed an implementation-independent protocol for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014